Skip to content

PHPC-2474: BinaryVector support #1853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: v2.x
Choose a base branch
from

Conversation

jmikola
Copy link
Member

@jmikola jmikola commented Jul 24, 2025

Copy link
Member

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the python implementation, the Binary class has a from_vector static method. You have certainly considered this solution, of having only a factory method instead of a new class. But the new methods toArray and getVectorType are specific to this binary type, so having a dedicated class makes sense. But if you create a new Binary($raw, Binary::TYPE_VECTOR) instance with the vector type from the raw binary data, you don't have this methods. The current state doesn't allow creating a BinaryVector from raw data. I think you would have to wrap in into a BSON Document and encode/decode it to get a BinaryVector instance. I don't find any other binary type that would require a specific class.

I see 3 possible implementations:

  1. Add Binary::fromVector(array $vector, VectorType $vectorType) and toArray() to the existing Binary class
  2. Create BinaryVector class that extends Binary (current state of this PR).
  3. Create BinaryVector class that implements BinaryInterface, \JsonSerializable, Type, \Stringable.

We need to compare pro/con of each solution.

*/
public const VECTOR_TYPE_PACKED_BIT = UNKNOWN;

final public function __construct(array $vector, int $type) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to specify an advanced type in this stub? It should be @param list<int|float> $vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that if it'd help Psalm or other static analysis, but it'd have no effect for generated stubs.

*/
public const VECTOR_TYPE_PACKED_BIT = UNKNOWN;

final public function __construct(array $vector, int $type) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the $type param confusing as the parent constructor has a param with the same name but a different meaning. It should be $vectorType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can change to $vectorType when moving to Binary::fromVector().


namespace MongoDB\BSON;

class BinaryVector extends Binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By extending the Binary class you inherit from the Binary::TYPE_* constants that are not accurate for this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good argument for not extending.

Comment on lines +22 to +27
typedef enum {
PHONGO_BSON_VECTOR_TYPE_INVALID = 0,
PHONGO_BSON_VECTOR_TYPE_INT8 = 0x03,
PHONGO_BSON_VECTOR_TYPE_FLOAT32 = 0x27,
PHONGO_BSON_VECTOR_TYPE_PACKED_BIT = 0x10,
} phongo_bson_vector_type_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can use an enum in the C code, you should be able to use a PHP userland enum for the vector type. The constructor signature would be more strict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second suggestion you had was to place the Binary::fromVector() factory method on the enum class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum method would look like this:

enum VectorType
{
    case Int;
    case Float;
    case PackedBit;

    public function createBinaryVector(array $vector): Binary
    {
        return Binary::fromVector($vector, $this);
    }
}

Usage:

VectorType::Int->createBinaryVector([1, 2, 3]);

@@ -308,3 +316,250 @@ bool phongo_binary_new(zval* object, const char* data, size_t data_len, bson_sub

return true;
}

/* MongoDB\BSON\BinaryVector implementation */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't it in a new file src/BSON/BinaryVector.c?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We depended on so many Binary.c functions (many of which are static) that it was simpler to share the same file rather than export those symbols.

@@ -7,7 +7,7 @@

namespace MongoDB\BSON;

final class Binary implements BinaryInterface, \JsonSerializable, Type, \Stringable
class Binary implements BinaryInterface, \JsonSerializable, Type, \Stringable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classes does not define any method that is not in one of the interfaces. This means you don't need to make it extensible: the new BinaryVector class could implement all this interfaces (re-using the underlying C functions).

In the library, I used MongoDB\BSON\Binary type in the builder, but I would change it to use BinaryInterface. Why this interface doesn't extend MongoDB\BSON\Type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces were never very useful for PHPC. One of the original use cases was have an embedded object masquerade as a UTCDateTime and incorporate a timezone, but the benefits are mainly for userland applications.

I think sharing a single Binary class is ideal for anything that would expect the BSON type, and avoiding a separate BinaryVector child class seems like it'd make things easier for users (and simplify what comes back from BSON-to-PHP conversion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants